Random Forests for CUDA GPUs

نویسندگان

  • Daniel Slat
  • Mikael Hellborg Lapajne
  • Håkan Grahn
چکیده

Context. Machine Learning is a complex and resource consuming process that requires a lot of computing power. With the constant growth of information, the need for efficient algorithms with high performance is increasing. Today's commodity graphics cards are parallel multi processors with high computing capacity at an attractive price and are usually pre-installed in new PCs. The graphics cards provide an additional resource to be used in machine learning applications. The Random Forest learning algorithm which has been showed competitive within machine learning has a good potential for performance increase through parallelization of the algorithm. Objectives. In this study we implement and review a revised Random Forest algorithm for GPU execution using CUDA. Methods. A review of previous work in the area has been done by studying articles from several sources, including Compendex, Inspec, IEEE Xplore, ACM Digital Library and Springer Link. Additional information regarding GPU architecture and implementation specific details have been obtained mainly from documentation available from Nvidia and the Nvidia developer forums. The implemented algorithm has been benchmarked and compared with two state-of-the-art CPU implementations of the Random Forest algorithm, both regarding consumed time for training and classification and for classification accuracy. Results. Measurements from benchmarks made on the three different algorithms are gathered showing the performance results of the algorithms for two publicly available data sets. Conclusion. We conclude that our implementation under the right conditions is able to outperform its competitors. We also conclude that this is only true for certain data sets depending on the size of the data sets. Moreover we conclude that there is potential for further improvements of the algorithm both regarding performance as well as adaption towards a wider range of real world applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...

متن کامل

Parallel tree-ensemble algorithms for GPUs using CUDA

We present two new parallel implementations of the tree-ensemble algorithms Random Forest (RF) and Extremely randomized trees (ERT) for emerging many-core platforms, e.g., contemporary graphics cards suitable for general-purpose computing (GPGPU). Random Forest and Extremely randomized trees are ensemble learners for classification and regression. They operate by constructing a multitude of dec...

متن کامل

Isolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs

For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...

متن کامل

Implementation of Random Linear Network Coding Using NVIDIA's CUDA Toolkit

In this paper we describe an efficient GPU-based implementation of random linear network coding using NVIDIA's CUDA toolkit. The implementation takes advantage of the highly parallel nature of modern GPUs. The paper reports speed ups of 500% for encoding and 90% for decoding in comparison with a standard CPU-based implementation.

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010